Tidy data

About the data

  • This data is called Residents Profile downloading on the website of the council of City of Melbourne, which covers 7 topics: the age and gender, economy, income, employment, education, housing and dwellings. The objective of this data is the residents living in the ten suburbs of City of Melbouren in the 2011 and 2016 censuses year. Under these seven topics, each topic covers several small topics, namely variables. These large numbers of variables are included in one column ‘category’, and continue to be classified as column ‘sub_category’.In order to make the data easier for us to use, we tidy the data according to the questions we interested in.

  • In order to clean up the data, we first filter out the topics we are interested in from the original data, and then start from answering the project questions, filter the data related to the questions again, and transform its format into a format that is easy to use later. In the data cleaning part, most of the data sets that need to be used later have been cleaned up and can be used directly in the analysis.
key_group_pop <- raw_residents %>% 
  filter(category == "Age - key groups") %>% 
  pivot_wider(names_from = sub_category, values_from = value) %>%
  rename(Children =  starts_with("Children"), Youth = starts_with("Youth"), Adult =  starts_with("Adult"), Older = starts_with("Older")) %>%
  mutate(Population = (sum = rowSums(.[6:9]))) # Tidy ket groups and calculate population

age_group_pop <- raw_residents %>%
  filter(category %in% c("Age - key groups","Age - 5 year groups","Language spoken at home - detailed","Age - median")) %>%
   mutate(sub_category = str_replace_all(sub_category,c(" years" = "","0-11" = "", "12-25" = "", "26-59"= "", " "="","[()]"= "","60andover" = ""    ) )) # A big dataset focusing on age

pivot_clean <- function(data,x){
  data %>% filter(category %in% c(x)) %>%
  mutate(subcategory_modified = fct_relevel(sub_category,unique(sub_category))) %>%
  pivot_wider(names_from = category, values_from = subcategory_modified) # create clean function for following tidy process
}

year_group<-age_group_pop %>%
  pivot_clean("Age - 5 year groups") %>%
  rename(year_group = "Age - 5 year groups", year_person = "value") # focusing on detail age data using 5 years as unit

key_group<- age_group_pop %>%
  pivot_clean("Age - key groups") %>%
  rename(key_group = "Age - key groups", key_person = "value") # focusing on four key age group

language <- age_group_pop %>%
  pivot_clean("Language spoken at home - detailed") %>%
  rename(language = "Language spoken at home - detailed", language_person = "value") %>%
  filter(!str_detect(language, "Englishonly|Other|Speak|notstated"), year == "2016", geography == "City of Melbourne" ) %>%
  mutate(language = str_remove_all(language,c("ChineseLanguages-" = "", "IndoAryanLanguages-" = "", "SoutheastAsianAustronesianLanguages-" = "","excludingDari" = "", "AustralianIndigenousLanguages" = "AIL" ) )) # extrat out the language residents speak besides English and tidy format of value

age_median<- age_group_pop %>%
  pivot_clean("Age - median") %>%
  rename(age_median = "value") %>%
  select(geography,year,age_median) # foucusing on aged median

edu<- raw_residents %>%
  pivot_clean("Education institution type attending - overview") %>%
  rename(education_type = "Education institution type attending - overview") %>%
  filter(!str_detect(education_type,"Other type of Educational Institution|Type of Educational Institution not stated") ) # This edu focusing on the number of student 

edu_level<- raw_residents %>%
  pivot_clean("Non-school qualification: level of education - overview") %>%
  rename(education_level = "Non-school qualification: level of education - overview") %>%
  filter(!str_detect(education_level, "Level of education inadequately described|Level of education not stated") ) %>%
  select(geography,year, value,education_level) %>%
  mutate(education_level = str_remove_all(education_level, c(" " = "_"))) %>%
  pivot_wider(names_from = education_level, values_from = value) #edu_level foucsing on the numbers of people who have different education level

ocp<- raw_residents %>%
  pivot_clean("Occupation") %>%
  filter(Occupation != "Inadequately described/Not stated")

ocp_model<- ocp %>%
  mutate(Occupation = str_remove_all(Occupation, c(" " = "_"))) %>%
  select(geography, value, year, Occupation) %>%
  pivot_wider(names_from = Occupation, values_from = value) # tidy occupation data for modeling

eco<- raw_residents %>%
  pivot_clean("Personal income - median") %>%
  rename(median_income = "value") %>%
  select(geography,year,median_income) # focusing on median income 

rent<- raw_residents %>%
  pivot_clean("Housing rental weekly payments - overview") %>%
  rename(rental_payment = "Housing rental weekly payments - overview") %>%
  filter(rental_payment != "Rent not stated") %>%
  mutate(rental_payment = str_remove_all(rental_payment, c("\\$" = "", " "= "","andover" = ""))) # focusing on rent and tidy format, which ready for using 

Population

Column

Summary

  • First of all, we like to look at the population of each Suburbs of the City of Melbourne, and what is the difference between 2011 and 2016. This chart describes the population comparison of 10 suburbs of the City of Melbourne, as well as the changes between 2016 and 2011.

  • Compared with 2011, most suburbs show a trend of population growth, especially Melbourne CBD has become the one with the largest population growth(From 20,117 to 36,909 persons). However, East Melbourne and South Yarra are the only two areas with population decline.

Column

Population comparison between 2011 and 2016 census

Age

Column

Summary

Key Age Group Comparison

  • For this part, let’s turn to the topic Age. For the age, we want to compare the differences between the City of Melbourne and Greater Melbourne in terms of age groups from a macro perspective, so we calculated the percentagen of each four key age groups in the total population, and we also added the factors of census years. For the group Children, it refers to the residents aged 0-11, the Youth refers to the residents aged 12-25, the Adult refers to the residents aged 26-59, and the Older refers to the residents aged 60 and over.

  • The percent of each age groups of two regions have not changed much in different years, it is worth noting the different proportion of differences between age groups. The proportion of adults is the largest in both regions, where it approximately occupies 50% of the population.The proportion of Youth in the City of Melbourne(Aprox 30% in 2016) is slightly higher than Greater Melbourne(Aprox 18% in 2016). The proportion of Older and children in the City of Melbourne(Aprox 10% and 5% in 2016) is far lower than Greater Melbourne(Aprox 20% and 15% in 2015).

Percentage change by groups

  • Perhaps the proportion of each age group and the comparison between years are not very good at perceiving the difference of change. This figure shows the percentage change of each age groups from 2011 to 2016 in the City of Melbourne and Greater Melbourne.

  • Obiviously, the population growth in the City of Melbourne in all age groups is much greater than in Greatert Melbourne. The largest growth rate was in the Youth, which grew 54% compared to 2011 in 2016, while the change in Greater Melbourne was only 8%. The second is the Adult, which has reached a growth rate of 42%, and the Greater Melbourne is only 11%. Then, the growth rates of Children and Older were 36% and 35%, respectively, and 12% and 16% for Greater Melbourne. For the City of Melbourne, the growth rate of any age group is amazing. The desire for living in the city has not abated and become more and more intense. We are also very looking forward to the 2021 census. How much more will this percentage rise?

Population pyramid

  • In order to better understand the age difference between the City of Melbourne and Greater Melbourne, we utilize the data of five-year age groups, and build a population pyramid(Census 2016). The y-axis is the age group, and the x-axis is the proportion of the population of each age group in the total population in their regions, which more intuitively reflects the population age difference between the two regions.

  • Like most of the population pyramids, these two regions show a narrow shape at both ends and a wider shape in the middle but the most striking part is the age group of 20-24. Greater Melbourne accounts for about 7% in this age group, while the City of Melbourne accounts for over three times of it. In the 25-29, 30-34 groups, the proportion of City of Melbourne is twice as much as Greater Melbourne. This also makes percentage of the City of Melbourne in other age groups is basically smaller than Greater Melbourne. The proportion of youth age groups in Melbourne is incredible, we think it not only reflects that the City of Melbourne is a very dynamic regions, but also reflects that a large number of young people want to find their own place in where full of opportunities.

Age pattern inside

  • Finally, let’s go back to the City of Melbourne and see how age pattern in each suburbs is different. In this task, we also use the data of five years group, and compare the differences between 2011 and 2016. We have found that the City of Melbourne is a place where most young people live, but when we analyze each suburbs inside it, we find that the age pattern of each suburbs is very different. This 100% chart covers a lot of content. First of all, we distinguish each suburbs by color. The y-axis is the proportion, and the x-axis is the age group. The observation process of this diagram is intersting. Starting from each age group of x-axis, looking from the bottom to the top, by observing the area size of different color blocks, we can know the population difference between different suburbs in this age group. Then we start from Y-axis and observe how the area of a single color block changes from left to right. We can know the distribution of age pattern in this suburb compared to others. Finally, by comparing the different shapes of different color blocks from left to right, We can know the trend of population proportion changing with age in each suburbs and how it differs among others.

  • In this 100% chart, in addition to recognizing the different basic information of population proportion in each suburbs, let’s pay attention to the shape of different suburbs. Let’s take a look at the color blocks of Melbourne CBD, Parkville and South Yarra from left to right. Melbourne CBD has the largest population in the age group of 15-34, because its color block is the widest in this section, and then its shape gradually shrinks until it disappears, which means that the population in the elderly group is very small in Melbourne CBD. However, the shapes of the two color blocks of Parkville and South Yarra are opposite. Before the age of 55, the shape of the two color blocks is very narrow, which means that the proportion of the population is very small. However, after the age of 55, the two color blocks start to suddenly widen, which means that the elderly mainly livining in this two suburbs of the City of Melbourne. Compared with the patterns of 2011 and 2016, the age distribution of the City of Melbourne has not changed much. It is worth noting that in the 2016 census, Southbank, Parkville, South Yarra, North Melbourne and South Melbourne have residents over 100 years old, which may be a good place for the retirement.

Column

Key Age Group Comparison

Percentage Change

Population Pyramid

Age Pattern Inside

Language

What languages the residents of City of Melbourne speak at home besides English?

We’ve explored some demographic features in the City of Melbourne. Melbourne is a multicultural city. No matter where you come from, you may find your own culture here. Language may be a major feature of cultural diversity. We are very curious about what languages the residents of City of Melbourne would speak at home besides official language.

Student

Column

Summary

  • Further, in education, we want to know the distribution of students of different education types, and find out which region has the most university students.We divide education types into Primary, Pre-school, Secondary, Techinical or Further Educational Insititution and University or orther Tertiary Institution and construct a 100% chart to compare the number of students in each suburbs in 2016.

  • From the figure, no matter which suburb, the student of tertiary education has the largest proportion. The Melbourne CBD is where the most university student live in, while East Melbourne only has 425 students studying in University of other Teritiary Insititutions in 2016.

Column

Distribution of education types of students across the City of Melbourne

Occupation

Column

Summary

  • In this part, we want to know about the composition of occupations in the City of Melbourne, which are the main occupations, and have a look whether the situation in CBD will be different.

  • It can be recognized from the figure that the majority type of occupation in the City of Melbourne is Professionals, which is more than twice higher than the second ranked occupation Manager, and the least occupation type is Machinery operators and drivers. Professionals is also the major occupation in Melbourne CBD, and the distribution of occupation types in these two regions is very similar.

Column

The distribution of Occupation Types in City of Melbourne and CBD

Income

Column

Summary

  • Finally, let’s glimpse the median income in the each suburbs in the City of Melbourne between 2011 and 2016. Which suburbs have the highest median income in 2016, is the central business district? Compared with 2011, which regions have increased their median income and which have declined? For answering these questions, we build a scatter plot to distinguish the median income between surbubrs and year.

  • From this figure, we observed that there were four regions with the regression of median income, and one of them was Melbourne CBD, the other three were Carlton, Southbank and North Melbourne. Melbourne CBD’s median income is not as good as we think, and it is not outstanding compared with other regions. In 2016, the highest median income was \(East Melbourne\), with Australian dollars 1322 per week and the lowest median was \(Carlton\), with Australian dollars 355 per week.

Column

Median Income

References